Automatic Detection of Latent Common Clusters of Groups in MultiGroup Regression
نویسندگان
چکیده
We present a flexible non-parametric generative model for multigroup regression that detects latent common clusters of groups. The model is founded on techniques that are now considered standard in the statistical parameter estimation literature, namely, Dirichlet process(DP) and Generalized Linear Model (GLM), and therefore, we name it “Infinite MultiGroup Generalized Linear Models” (iMGGLM). We present two versions of the core model. First, in iMG-GLM-1, we demonstrate how the use of a DP prior on the groups while modeling the response-covariate densities via GLM, allows the model to capture latent clusters of groups by noting similar densities. The model ensures different densities for different clusters of groups in the multigroup setting. Secondly, in iMG-GLM-2, we model the posterior density of a new group using the latent densities of the clusters inferred from previous groups as prior. This spares the model from needing to memorize the entire data of previous groups. The posterior inference for iMG-GLM-1 is done using Variational Inference and that for iMG-GLM-2 using a simple Metropolis Hastings Algorithm. We demonstrate iMG-GLM’s superior accuracy in comparison to well known competing methods like Generalized Linear Mixed Model (GLMM), Random Forest, Linear Regression etc. on two real world problems.
منابع مشابه
Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملStock Price Prediction using Machine Learning and Swarm Intelligence
Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...
متن کاملAn Automatic Detection of the Fire Smoke Through Multispectral Images
One of the consequences of a fire is smoke. Occasionally, monitoring and detection of this smoke can be a solution to prevent occurrence or spreading a fire. On the other hand, due to the destructive effects of the smoke spreading on human health, measures can be taken to improve the level of health services by zoning and monitoring its expansion process. In this paper, an automated method is p...
متن کاملSpike and Slab Gene Selection for Multigroup Microarray Data
DNA microarrays can provide insight into genetic changes that characterize different stages of a disease process. Accurate identification of these changes has significant therapeutic and diagnostic implications. Statistical analysis for multistage (multigroup) data is challenging, however. ANOVA-based extensions of two-sample Z-tests, a popular method for detecting differentially expressed gene...
متن کاملThe Measure of Residential Segregation of Socio-Economic Groups by Using Multigroup Indices in Shiraz City
Urban segregation has been a problem of many cities of the world. Many researchers have interested in the urban segregation issues. Urban segregation has strongly concentrated poverty and created underclass. Different types of urban segregation exist, including income and racial or ethnical segregation, and depending on the contextual mechanisms within a city. To understand and plan a better co...
متن کامل